Introduction to Response Time in APIs
Get to know the different time factors affecting API performance.
We'll cover the following
Motivation#
Most modern applications are data oriented. These applications process data and present it to the users in user-friendly formats. Especially when we talk about dynamic applications, the data continuously updates. A server stores and serves the continuously updating information whenever requested by connected devices or clients. We concern ourselves primarily with the Internet in this chapter because that is a common way for customers to request services via APIs.
At the API design level, we must establish API SLAs that are realistically achievable using current technology and our cost budget. For example, for voice calls over the Internet, one-way latency of more than 100 ms will start deteriorating the listener’s experience. So, in this case, we (as API and back-end designers) would have some threshold to target for. Now, we need to carefully see, from end to end (from client to the service), how we’ll design to meet the goal (latency in the case of voice over the internet) and, if it’s not possible, how we’ll mitigate it.
Over the years, major services like Google Search and others have set high expectations for customers in general. API designers can’t ignore such customer expectations, or their app might fail because no one wants to use a slow app. The following questions, if answered properly, result in an effective customer experience:
How quickly is the API acting on requests and sending responses back?
How does the increasing number of requests affect the performance of an API?
Depending on the required operations, different APIs may have varying latencies. These APIs access different types of memory to save or retrieve information, which also takes time. We’ll take help from the standard numbers given in the table below to derive our calculations.
Standard Latency Numbers
Operations | Time |
| 0.5ns |
| 0.9ns |
| 2.8ns |
| 10ns–100ns |
| 9μs |
| 100μs–1000μs |
| 1ms–10ms |
| 10ms–100ms |
| 100ms–1000ms |
| >1s |
Note: A region is referred to as a geographical location, a zone is an isolated location within a region, and a data center is the physical existence of resources in a zone. A region can have multiple zones, and a zone can have multiple data centers in it. Moreover, intra-zone communication is referred to as communication between two data centers, and inter-zone communication is referred to as communication between two zones within a region.
Latency vs. response time#
Sending a request and getting a response back from the server takes some time. This time should be as low as possible to minimize the user-perceived latency for a better user experience. Measuring this time is critical in monitoring API performance, which leads to customer satisfaction. We measure it in the following two steps:
Latency (network latency) is the propagation time of a message (request and response) between the client and server, excluding the processing time.
Processing time is the time a server takes to process a request, including query execution, computation, file handling, and so on.
API response time is the time an API takes to respond to a request. It includes both network latency and processing time. It begins as the request starts and ends when the client receives the response. Although some references may use latency and response times interchangeably, they measure two entirely different time frames. The illustration given below shows the key difference between the two.
To summarize the discussion above, let’s take a look at the equation below:
The equation above shows that a smaller response time requires us to lower the latency, processing time, or both. Latency depends upon various factors, as discussed here, such as the distance between communicating machines and intermediate network components, for example, caches or proxy servers. The pinging service generally measures the latency of an API endpoint. However, a request to ping a server takes less time (response time) than a request that requires retrieving some data or files from the database.
Note: The appropriate response time depends on specific use cases. In general, an API is considered effective if it has an average response time between 0.1 and 1 second. For example, for a multiplayer online gaming service, a response time of 500 ms would not be optimal.
Factors affecting response time#
In this section, we’ll look at the different factors affecting the response time of APIs. Some of the factors that can affect the processing time of a server are defined in the table at the beginning of the lesson. Let’s look at the other key factors in calculating the response time.
Let’s suppose the client/browser does not know the server's IP address. So, when a request is forwarded from the client, it first goes to the DNS server. After getting the requested IP address, the client performs the TCP handshake. When it receives the acknowledgment, the server sends the SSL/TLS certificate to the client to create a secure channel. Next, the HTTP request is forwarded to the destination server. The server processes the request by performing the required operations and stores or retrieves the data in or from the database. Finally, the client gets the response, indicating the request was successful, and the data is stored or retrieved in or from the database.
The following illustration gives an overview of the events that occur during an API request and response.
We break down the response time into the following segments, as depicted in the illustration:
DNS lookup is the time to resolve the IP address against a domain name through the DNS server.
TCP handshake is the time to establish an initial connection between the client and server.
SSL/TLS handshake is the time to create a secure communication channel for data exchange.
Transfer start is the time to acquire the first byte of the requested data in the response message. It includes both the
(round trip time of GET/POST) messages and processing time at the server end.
Download is the time taken by a client to fetch the complete data.
Note: The DNS lookup, TCP handshake, and SSL/TLS handshake times will generally be called base time for future references in the course.
We can obtain the latency of a request by excluding the server’s processing time using the following general equation:
Quiz
Question
Let’s suppose the client is accessing a service from different regions. Will the response time be the same in different regions for a particular server?
The response time can vary greatly depending on the region in which the requests are made. A user who initiates a request near the data center may receive a response quicker than someone far away. Therefore, to keep the service response time within an acceptable range, replication of servers across the world is essential.
In the coming lessons, we’ll calculate the response time of an API by estimating the latency and processing time.
Quiz on Important API Concepts - II
Estimation of Processing Time of an API